Skip to content
This repository has been archived by the owner on Jul 16, 2024. It is now read-only.

Enable join using common columns #198

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Conversation

ruxuez
Copy link
Contributor

@ruxuez ruxuez commented Jun 14, 2023

Previously, when doing a join with common columns,
only rows from one of the 2 join dataframes were selected.

This patch fixes this bug and can semi-automatically detect
common columns for join and rename columns with the same
name but not for join.

@ruxuez ruxuez requested a review from xuebinsu June 14, 2023 14:42
@beeender beeender requested a review from yihong0618 June 17, 2023 06:57
ret = t1.join(t2, on=["id"], self_columns={"id": "t1_id"}, other_columns={"id": "t2_id"})
assert sorted(next(iter(ret)).keys()) == sorted(["t1_id", "t2_id"])
ret = t1.join(t2, on=["id"], self_columns={"id"}, other_columns={"id"})
assert sorted(next(iter(ret)).keys()) == ["id"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only one item we do not need the sorted here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right

f"""
SELECT {",".join(target_list)}
FROM {self._name} {how} JOIN {other_clause} {sql_on_clause} {sql_using_clause}
""",
parents=[self, other],
)
coalesce_target_list = []
if not (self_columns == {} or other_columns == {}):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if self_columns and other_columns:
maybe better?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants